Recognizing Multi-Talker Speech with Permutation Invariant Training

نویسندگان

  • Dong Yu
  • Xuankai Chang
  • Yanmin Qian
چکیده

In this paper, we propose a novel technique for direct recognition of multiple speech streams given the single channel of mixed speech, without first separating them. Our technique is based on permutation invariant training (PIT) for automatic speech recognition (ASR). In PIT-ASR, we compute the average cross entropy (CE) over all frames in the whole utterance for each possible output-target assignment, pick the one with the minimum CE, and optimize for that assignment. PIT-ASR forces all the frames of the same speaker to be aligned with the same output layer. This strategy elegantly solves the label permutation problem and speaker tracing problem in one shot. Our experiments on artificially mixed AMI data showed that the proposed approach is very promising.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training

Although great progresses have been made in automatic speech recognition (ASR), significant performance degradation is still observed when recognizing multi-talker mixed speech. In this paper, we propose and evaluate several architectures to address this problem under the assumption that only a single channel of mixed signal is available. Our technique extends permutation invariant training (PI...

متن کامل

Recognizing spoken vowels in multi-talker babble: spectral and visual speech cues

It has been proposed that both spectral and visual speech cues assist in segregating a talker from noise. To test how these cues interact, the experiment examined vowel identification (in hVd context) when presented in multi-talker babble. The availability of spectral cues was manipulated by filtering the signal into (1) 8 frequency amplitude-envelope bands or (2) the same bands with additional...

متن کامل

Differences in talker recognition by preschoolers and adults.

Talker variability in speech influences language processing from infancy through adulthood and is inextricably embedded in the very cues that identify speech sounds. Yet little is known about developmental changes in the processing of talker information. On one account, children have not yet learned to separate speech sound variability from talker-varying cues in speech, making them more sensit...

متن کامل

The effects of talker variability and variances on incidental learning of lexical tones

Multi-talker variability has been found to be very effective in the perception and production training of nonnative sound categories in the past few decades. The phonetic training paradigms were mostly explicit learning in which learners received feedback of the categories when exposed to the training stimuli. More recently, studies have started to investigate how auditory categories are learne...

متن کامل

Speaker-Invariant Training via Adversarial Learning

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to mini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017